A Unicode block is one of several contiguous ranges of numeric character codes () of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.
Each block is generally, but not always, meant to supply glyphs used by one or more specific languages, or in some general application area such as mathematics, surveying, decorative typesetting, social forums, etc.
Blocks are pairwise disjoint; that is, they do not overlap. The starting code point and the size (number of code points) of each block are always multiples of 16; therefore, in the hexadecimal notation, the starting (smallest) point is U+ xxx0 and the ending (largest) point is U+ yyyF, where xxx and yyy are three or more hexadecimal digits. (These constraints are intended to simplify the display of glyphs in Unicode Consortium documents, as tables with 16 rows labeled with the last hexadecimal digit of the code point.) The size of a block may range from the minimum of 16 to a maximum of 65,536 code points.
Every assigned code point has a glyph property called "Block", whose value is a character string naming the unique block that owns that point. However, a block may also contain unassigned code points, usually reserved for future additions of characters that "logically" should belong to that block. Code points not belonging to any of the named blocks, e.g. in the unassigned planes 4–13, have the value block="No_Block".
Simply belonging to a particular Unicode block does not guarantee the certain particular properties of the characters it is or will be expected to contain. The identity of any character is determined by its properties stated in the Unicode Character Database. For example, the contiguous range of 32 noncharacter code points U+FDD0..U+FDEF share none of the properties common to the other characters in the Arabic Presentation Forms-A block, that they are certainly not Arabic script characters or "right-to-left noncharacters", and are assigned there as a filler to this block given that it has been agreed that no further Arabic compatibility characters will be encoded.
Each code point also has a script property, specifying which writing system it is intended for, or whether it is intended for multiple writing systems. This, also, is independent of block.
In descriptions of the Unicode system, a block may be subdivided into more specific subgroups, such as the "Chess symbols" in the Miscellaneous Symbols block (not to be confused with the separate Chess Symbols block). Those subgroups are not "blocks" in the technical sense used by the Unicode consortium, and are named only for the convenience of users.
Prior to this, the following former blocks were moved:
Other classifications
List of blocks
Moved blocks
+Former Unicode blocks from before Unicode 2.0 U+1000..U+105F Tibetan 1.0.0 1.0.1 Myanmar Tibetan 96 71 Tibetan script U+3400..U+3D2D Hangul 1.0.0 2.0 CJK Unified Ideographs Extension A Hangul Syllables 2350 2350 Hangul U+3D2E..U+44B7 Hangul Supplementary-A 1.1 2.0 1930 1930 U+44B8..U+4DFF Hangul Supplementary-B CJK Unified Ideographs Extension A and Yijing Hexagram Symbols 2376 2376
External links
|
|